{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 利用SVM、决策树、随机森林和KNN对信用卡违约率进行分析\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据来源及字段含义\n",
    "python已被广泛应用在数据分析领域，这回我们就用几种常见的分类算法对信用卡违约进行识别，来帮助我们更好的完成风控措施。    \n",
    "\n",
    "本次分析的数据来源:https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset\n",
    "\n",
    "该数据是某银行从2005年4月至9月的信用卡数据，数据包含25个字段，具体含义如下：    \n",
    "\n",
    "| 字段 | 含义 |   \n",
    "| :---: | :-----: |    \n",
    "| ID | 客户ID |\n",
    "| LIMIT_BAL | 可透支金额 |\n",
    "| SEX | 性别（1=男，2=女） |\n",
    "| EDUCATION | 1=研究生, 2=本科, 3=高中, 4=others, 5=未知, 6=未知 |\n",
    "| MARRIGE | 婚姻状况 (1=已婚, 2=单身, 3=其它) |\n",
    "| AGE | 年龄 |\n",
    "| PAY_0 | 2005年9月还款状况 |\n",
    "| PAY_2 | 2005年8月还款状况 |\n",
    "| PAY_3 | 2005年7月还款状况 |\n",
    "| PAY_4 | 2005年6月还款状况 |\n",
    "| PAY_5 | 2005年5月还款状况 |\n",
    "| PAY_6 | 2005年4月还款状况 |\n",
    "| BILL_AMT1 | 2005年9月账单金额 |\n",
    "| BILL_AMT2 | 2005年8月账单金额 |\n",
    "| BILL_AMT3 | 2005年7月账单金额 |\n",
    "| BILL_AMT4 | 2005年6月账单金额 |\n",
    "| BILL_AMT5 | 2005年5月账单金额 |\n",
    "| BILL_AMT6 | 2005年4月账单金额 |\n",
    "| PAY_AMT1 | 2005年9月还款金额 |\n",
    "| PAY_AMT2 | 2005年8月还款金额 |\n",
    "| PAY_AMT3 | 2005年7月还款金额 |\n",
    "| PAY_AMT4 | 2005年6月还款金额 |\n",
    "| PAY_AMT5 | 2005年5月还款金额 |\n",
    "| PAY_AMT6 | 2005年4月还款金额 |\n",
    "| default.payment.next.month | 下月违约情况（1：违约，0：守约） |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据描述性统计分析\n",
    "python中有相当便捷的库可以让我们了解一些数据的基本信息何进行可视化如pandas/seaborn/matplotlib等，接下来我们利用这些库进行描述性统计分析。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(30000, 25)\n",
      "                 ID       LIMIT_BAL           SEX     EDUCATION      MARRIAGE  \\\n",
      "count  30000.000000    30000.000000  30000.000000  30000.000000  30000.000000   \n",
      "mean   15000.500000   167484.322667      1.603733      1.853133      1.551867   \n",
      "std     8660.398374   129747.661567      0.489129      0.790349      0.521970   \n",
      "min        1.000000    10000.000000      1.000000      0.000000      0.000000   \n",
      "25%     7500.750000    50000.000000      1.000000      1.000000      1.000000   \n",
      "50%    15000.500000   140000.000000      2.000000      2.000000      2.000000   \n",
      "75%    22500.250000   240000.000000      2.000000      2.000000      2.000000   \n",
      "max    30000.000000  1000000.000000      2.000000      6.000000      3.000000   \n",
      "\n",
      "                AGE         PAY_0         PAY_2         PAY_3         PAY_4  \\\n",
      "count  30000.000000  30000.000000  30000.000000  30000.000000  30000.000000   \n",
      "mean      35.485500     -0.016700     -0.133767     -0.166200     -0.220667   \n",
      "std        9.217904      1.123802      1.197186      1.196868      1.169139   \n",
      "min       21.000000     -2.000000     -2.000000     -2.000000     -2.000000   \n",
      "25%       28.000000     -1.000000     -1.000000     -1.000000     -1.000000   \n",
      "50%       34.000000      0.000000      0.000000      0.000000      0.000000   \n",
      "75%       41.000000      0.000000      0.000000      0.000000      0.000000   \n",
      "max       79.000000      8.000000      8.000000      8.000000      8.000000   \n",
      "\n",
      "       ...      BILL_AMT4      BILL_AMT5      BILL_AMT6       PAY_AMT1  \\\n",
      "count  ...   30000.000000   30000.000000   30000.000000   30000.000000   \n",
      "mean   ...   43262.948967   40311.400967   38871.760400    5663.580500   \n",
      "std    ...   64332.856134   60797.155770   59554.107537   16563.280354   \n",
      "min    ... -170000.000000  -81334.000000 -339603.000000       0.000000   \n",
      "25%    ...    2326.750000    1763.000000    1256.000000    1000.000000   \n",
      "50%    ...   19052.000000   18104.500000   17071.000000    2100.000000   \n",
      "75%    ...   54506.000000   50190.500000   49198.250000    5006.000000   \n",
      "max    ...  891586.000000  927171.000000  961664.000000  873552.000000   \n",
      "\n",
      "           PAY_AMT2      PAY_AMT3       PAY_AMT4       PAY_AMT5  \\\n",
      "count  3.000000e+04   30000.00000   30000.000000   30000.000000   \n",
      "mean   5.921163e+03    5225.68150    4826.076867    4799.387633   \n",
      "std    2.304087e+04   17606.96147   15666.159744   15278.305679   \n",
      "min    0.000000e+00       0.00000       0.000000       0.000000   \n",
      "25%    8.330000e+02     390.00000     296.000000     252.500000   \n",
      "50%    2.009000e+03    1800.00000    1500.000000    1500.000000   \n",
      "75%    5.000000e+03    4505.00000    4013.250000    4031.500000   \n",
      "max    1.684259e+06  896040.00000  621000.000000  426529.000000   \n",
      "\n",
      "            PAY_AMT6  default.payment.next.month  \n",
      "count   30000.000000                30000.000000  \n",
      "mean     5215.502567                    0.221200  \n",
      "std     17777.465775                    0.415062  \n",
      "min         0.000000                    0.000000  \n",
      "25%       117.750000                    0.000000  \n",
      "50%      1500.000000                    0.000000  \n",
      "75%      4000.000000                    0.000000  \n",
      "max    528666.000000                    1.000000  \n",
      "\n",
      "[8 rows x 25 columns]\n",
      "0    23364\n",
      "1     6636\n",
      "Name: default.payment.next.month, dtype: int64\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAGMCAYAAAAvEF4OAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAbX0lEQVR4nO3de5xdVX338c+PhGgIAkFiCmqIIKIoxEvkokFCRR5QWy2gUKSKSkGrVqmXR5QqVArY+vjgBav4UAEVKnihlFYLFnkJiD4m1gJaUKtcRNGgQLjVIPz6x1rHnAyzJieZOWfOzHzer9e8cvba++y9dibZ37PW2nudyEwkSRrNJpNdAUnS8DIkJElNhoQkqcmQkCQ1GRKSpCZDQtNeRMyOiE1GlM2brPpIU4khoZngJcA/dhYiYlvgqok+SET8MCL+YIz1+0fEjiPK3h4RHxjjPTtExPNHlL2unsP66vPeiDiil7pLLbMnuwLSALwE+GJEnNpVdndE/A3wUF2+APgBcDtwbS1bSPkg9fO6vBh4Q2Z+rnGcB4B7OwsRMRvIzHywFh0I3B8Rx1P+7z0A/Ba4r24ftfzBzOzU6/W1Xv9Wt3kk8JfAZ0YePCLOBHbqOqfHA5tFxFF1eRZwe2b+UaP+0sOED9NpOouIbYBbgJ2BJ1Auyg8BCUT9mU25EN8B3JCZi+t73wY8MjNPqstnARdl5hcbx7oO2IJ60QfmAH+amZ0L/J7Ap4BXAmfWuiygXLx/3lWX12bmt2tr4dPAC4AXAd+gBM1xwFldh/4/mZkRsQNwX2beFhFPAv5ffe8mmXl/RDyVEkDXb/jfpGYqQ0LTWu3KORZ4LLAH8C7gN5SgCGBT4BuZ+baImAOsBlbUtz+W0pK4pS7vBByTmRc2jnUd8MbMvLyxfhNgYWb+vB7rt8CfA1tl5gkRMYtyQX8gIh4B/BPwTuBm4JvAPsCXgHOBe+puT83Mber+/wR4K3AQ8FHgZOAZwDLgeOBC4KTMPK/Hvz7JkND0FRFPo1wYf0m5cK6i/Jv/bdc2mwCz6oV5NvCj9bQkLszMC+u2jNjXw0Kis39gV+CJlO6eyyLic5TuoN+jtB5urn9elJknR8RjgO/X8icBrwLmAS/OzJd37f/mzFzUtbwH8GRKMP6mnvOhteyRmTnhYzGa3hy41nS2KfBeYE1d/gNgZUSsjIgVEbESWAm8v66ftQH7fgXw/Yi4vvNDaWmcM6LsekprYHPKp/r31bGHIzLzOcCHgb/PzGWZuWdmngyQmb+sLYQ3UYLjC5RWzgcj4o1d9eiMPxARj6IEwg3AacB3gaOBrwGPBE6NiF034BwlWxKa/iLicuCwzLxtPdttQfnkPdbA9VHr6W7638B3M/PWUdY/nXLxPhS4lDJw/RhKOP2MMoYxLzN3rNvvCHwBeC2la2w1cCPwLUrwXEoZQ9mpdlWtBD4G3A+8gzI2chdlrOUW4PPAH2fmm8f6e5C6eXeTZox6K+l5lG6cjm2ASzLzLyhjEN/MzH3q9qN1N63Pk4EPRMTnKeMF947cIDN/AexW93kssGUdk9gT+OuuTd8IbEkZR7kJOC8z19Sxh12AzVh7N1UA+2bmHXX50xFxEnAb8Angocx8sLaiZnd3k0ljMSQ0kzwAXJaZh3UKIuKlwJ51cQ/Kp/Hx6Hyafz/w3YhYmpl3jbH9JsAmEfFKStfTiZ0VmXkscGwd/9i5q/w64LqIWAjcXYtfBLwzIn7Tte8nULraDgEovVzMAk4H/mE8J6mZw5DQTDCH8kk7gf0jYkXXuq0oXTpQuoFO6lq3KfX/SEQsAXZk7e2to+kMgt8P/HlEbDMiIF4E7BYR76d0/VwL/IQyIH0BcEG9VTUoYxjnUVo39wLXAH/f2VHdZmfgToDM/Ee6Hhis25wE3JaZHx2jztKYDAnNBPMpQbEppWtpZEvieRHxOGD2iLt/vlffA2Wg+hrgyjGOsynl4g5AZt7edZz9gBcD2wHPAd4APIvyXMVs4BRgdkRsBhyemRdHxAeBqzKzu3XQcQZrb20dqz6bjrFeWi8HrqUBiYhHZebdo5R3ntfoPCfxsHEMabIYEpKkJp+T0JRTP3lrEvk7mDkMCU0pEXEo5QEzImL3iHjWiPUvq7eftt4/PyIOGVF2eETs3HrPZIuIYyLihevZZlZEHNk9JXpEvDAidu9Ttd5Ufxea5gwJTRkRsQg4Bvh4LdoLeGsUj6gXyHVmVY3yXRLdT1IfTnnyudsJwH9PUB33iYivTsS+uvyA8szEWA4AXtA1eyyUeZzmjPfgEfHciPhBRPwkIg6oxZ8AXh8R2493/xpujkloyqgPs304M79Tl7elTHuxBLiY8kzAFpTbWm+i3PY6C3hvnW9pHvAVYH9KwPwA2J5y++kHuw71yfU829Cq3+6UCfh+mJnLN+IUO/s5hnK30y/H2Gxnyu22D9VwvAQ4gjIf1D2U22a/TwnAjvMz8+YNrMsc4MeUmWv/A7gM2Csz74uIZ1OmTj9yQ/apqcWWhKaEOivq9p2AAMjMnwNPzcwbKa2DpcDbgIsz81l1+dk1IILSAvlb4EHKLKlbA39BmVL7zvpzDCVoNsbrKd/1MF6/Bc7JzCd3/1CmCD+ivs76A/A+yvxOt1HCbhHwZuBs1p7XwZRA3FB7ALdk5mWZ+SvK91osA8jMbwM71N+Npimfk9BUsQPwn52FOoawhNK19FPKxX8PSiti64i4ktKKWEEZw5gNPI8yHcanKNNfzAcel5kHd+33dWx819NrKNN5j9c3KM9ojHQAcDXlnLrHGl4AzIuId1Ie0vsJ8IfA0zrPWNRuoo05r8dRng/puIkym+0ldfl7lCe7/Y6KacqWhKaK+ZSJ6jrmUj4Zf6Iuv4PykNpxwJfrrKp7ZeabADLzgczcnjJl+L9TJtp7iDKm8a6u/c5j7Xc1EBGHRsSJ9CAnru/2s5S5l66PdWeUfTlwckTcAFxEeRiPzNwd2JsSDscBj6C0iI7rTGk+ynntHxEfqN1JY5lFmViw415KEHfcQfndaJoyJDRV/Ap4dGchM78LfJIyHxOUmVFXAB8AXlynAl8REWs6d/xE+Y6Gz1D6+5cAd2XmlcCyiPjTup85dVqNjgMpT0cPTGY+k9I99LwR3U2fBd6RmTtn5naZuQIgIjanfIPd6ZQW1yMz8zLKLLadgNuMtfM8QQnUXga272DdUJhL1/TklAkSf7URp6kpwpDQVPFj4CmtlZn5jDoO8XbKmMRSyjQYt3Td8XM4pfXxFuBIyhf+QOkm6kxfsU5rIDOP7Hzz24A9DfhKnb58ffajdD+9itKC2KGWv5W1rYB1WhKZeUJmRmbew9hWUrrxOpZSuvc6nkppwWiaMiQ0JWTmA8B/RsSy9WzamVV1X0o/+e8mvcvM0zJzO+AllE/dv6rlt2Xmx+pm3Z+SN6i7qSUiFtUuo3f0+p7MPJdy19WxPWx7YWYuoIxZfIB6Ec/M+zKz84VK67Qkeu1uqoPhqyLi+Ig4mDJJ4aV1H/sA19bfjaYpB641lbwX+HxEvLDOgfQiYJuI+DvK9z5/B/gF5VbNK4BFmbm66+ngcym3jq4BrqNruuy6zeMpX/nZ7UBKi+S946j3nHrcx2zImzLzlFq3WZTbebdiREunrv8Q8Ny6bp1grOc1j/KdFQ92va3T3XQCa7+5r+UI4COULrBXZ+YvagvnZMoYj6Yxn5PQlBLli4N2BL5OuRjuReleeU19vTVrv4Z0FuUT9Lsy8/TaCvmPxiR776E8C3BaP6bWjoijKJP3nbER730JpVVxLfCyzPzxiPW7A/9Vb1Ed+d4jKeMSn8nMd29M3Rt1OobyfeD/NlH71HAyJDRltWZVres63wWxCXD/iCeRB6oOnJ8CnDBiULzX9wdM6N1TUs8MCUlSkwPXkqQmQ0KS1DSt7m7aZpttcvHixZNdDUmaUlauXHl7vY36YaZVSCxevJgVK1asf0NJ0u9ExE2tdXY3SZKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqWlazQI7EZ719nMmuwoaQiv/9pWTXQVpUtiSkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkpgkPiYjYMiK+HBGXRMSXImJORJwZEVdHxPFd2210mSRpMPrRkngF8MHM3B+4DTgMmJWZewE7RMROEXHQxpb1ob6SpIbZE73DzPxY1+IC4AjgtLp8CbAMeAZw/kaW/bD7eBFxNHA0wKJFiybwTCRJfRuTiIi9gPnALcCttfjXwEJg3jjK1pGZZ2Tm0sxcumDBgj6ciSTNXH0JiYjYGvgI8BrgHmBuXbV5PeZ4yiRJA9KPges5wAXAcZl5E7CS0k0EsAS4cZxlkqQBmfAxCeC1wDOBd0fEu4FPAX8SEdsBBwJ7AglcsZFlkqQBmfCWRGb+XWbOz8zl9edsYDnwTWDfzLwrM1dvbNlE11eS1NaPlsTDZOYdrL1LadxlkqTBcCBYktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLU1JeQiIiFEXFFff3YiPhpRFxefxbU8jMj4uqIOL7rfT2VSZIGY8JDIiLmA2cD82rRHsBfZ+by+rMqIg4CZmXmXsAOEbFTr2UTXV9JUls/WhIPAocCq+vynsBREfGdiDi5li0Hzq+vLwGWbUDZOiLi6IhYERErVq1aNaEnIkkz3YSHRGauzsy7uoq+TLnYPxvYKyJ2o7Qybq3rfw0s3ICykcc7IzOXZubSBQsWTPDZSNLMNnsAx/hGZv4GICL+HdgJuAeYW9dvTgmrXsskSQMyiIvuv0bEthGxGbA/cB2wkrVdR0uAGzegTJI0IINoSZwIfA1YA3w8M2+IiJ8DV0TEdsCBlHGL7LFMkjQgfWtJZOby+ufXMvPJmblbZn60lq2mjFN8E9g3M+/qtaxf9ZUkPdwgWhKjysw7WHvn0gaVSZIGw4FgSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlq6ikkImKTiNgiImZHxL4R8ah+V0ySNPl6bUlcADwP+L/AUcCX+lYjSdLQ6DUkHp2ZFwM7ZeYrgLl9rJMkaUj0GhJ3R8SFwMqIeCFwdx/rJEkaErN73O5lwC6Z+Z2IWAIc2sc6SZKGRE8ticz8b2BNRPwvYA3wYF9rJUkaCr3e3fQR4ETgFGAH4Nx+VkqSNBx6HZPYNTMPBu7MzH8GtuxjnSRJQ6LXkFgVEe8B5kfEq4Db+lgnSdKQ6DUkXgncBVxNaUUc2a8KSZKGR68h8TLgDuBbwJ11WZI0zfUaElF/5gIHUZ6+liRNcz09J5GZZ3ctfjwiPtan+kiShkhPIRER3S2HBcAu/amOJGmY9PrE9b5dr9cAb+hDXSRJQ6bX7qYT+10RSdLw8UuHJElNY7YkIuJrQI4sBjIzf79vtZIkDYUxQyIz9x1rvSRperO7SZLU1OvdTUTEAtZ+I91jM/Pq/lRJkjQsen1O4kzgCcB84D7KOMWyPtZLkjQEeu1ueiJwAPAjYB/gob7VSJI0NHoNifuA5wOzKJP7ze9bjSRJQ6PXkPgs8FPgWOApwJ/1rUaSpKHR68D144DDKdOFXwR8p281kiQNjZ5aEpl5ama+EHgd8CTgpr7WSpI0FHq9u+kPgQMpLYpvA3v3s1KSpOHQa3fT04APZuYP+1kZSdJw6XUW2JP7XRFJ0vBxWg5JUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJaupLSETEwoi4or7eNCL+KSKuiojXjLdMkjQ4Ex4SETEfOBuYV4veBKzMzOcCh0TEo8ZZJkkakH60JB4EDgVW1+XlwPn19deBpeMsW0dEHB0RKyJixapVqybuLCRJEx8Smbk6M+/qKpoH3Fpf/xpYOM6ykcc7IzOXZubSBQsWTOSpSNKMN4iB63uAufX15vWY4ymTJA3IIC66K4Fl9fUS4MZxlkmSBqTXb6Ybj7OBf4mIvYFdgG9RupA2tkySNCB9a0lk5vL6503AC4CrgP0y88HxlPWrvpKkhxtES4LM/Blr71Iad5kkaTAcCJYkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJapo92RWQ1Jub/2rXya6ChtCi91zb1/3bkpAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpKa+h0REzI6ImyPi8vqza0ScGBHfjojTu7brqUySNDiDaEnsBpyXmcszczkwB1gG7A78MiL2i4hn9VI2gLpKkrrMHsAx9gReHBH7AtcCNwBfyMyMiH8FDgTu6rHsqyN3HhFHA0cDLFq0aACnI0kzxyBaEt8G9svM3YFNgbnArXXdr4GFwLweyx4mM8/IzKWZuXTBggX9OQNJmqEG0ZK4JjN/U1+vYG1QAGxOCap7eiyTJA3QIC68n46IJRExC3gppYWwrK5bAtwIrOyxTJI0QINoSfwVcC4QwEXAScAVEfEh4ID6cxNwSg9lkqQB6ntIZOZ1lDucfqfeqfQi4EOZ+ZMNKZMkDc4gWhIPk5n3A5/fmDJJ0uA4GCxJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSpyZCQJDUZEpKkJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTVMiJCLizIi4OiKOn+y6SNJMMvQhEREHAbMycy9gh4jYabLrJEkzRWTmZNdhTBHxYeArmfkvEXEYMDczP9W1/mjg6Lq4M3DDJFRzutoGuH2yKyGNwn+bE2v7zFww2orZg67JRpgH3Fpf/xp4ZvfKzDwDOGPQlZoJImJFZi6d7HpII/lvc3CGvrsJuAeYW19vztSosyRNC1PhgrsSWFZfLwFunLyqSNLMMhW6my4EroiI7YADgT0nuT4zid14Glb+2xyQoR+4BoiI+cALgK9n5m2TXR9JmimmREhIkibHVBiTkCRNEkNCo/Ipdw2ziFgYEVdMdj1mAkNCD+NT7hpmdYzybMozVOozQ0KjWQ6cX19fwtpbkKVh8CBwKLB6sisyExgSGs3Ip9wXTmJdpHVk5urMvGuy6zFTGBIajU+5SwL8z6/R+ZS7JGBqPHGtwfMpd0mAD9OpwafcJYEhIUkag2MSkqQmQ0KS1GRISJKaDAkNXEScEBHLG+u2jIjLIuLyiPijjdj35SOWnx4RT9+4mvbfIOo3oGMsj4jFXctndS9r6jIkNGyWAN/IzOWZ+aUJ2N/T68+wGkT9BnGM5cDiPh9Dk8C7mzQQ9ZbaC4BZQAB/A7wGeAxwbWa+ISLeDLwa2IryAN/LgPuBz1OmCvlRZr46Ik4ALs/MyyPiSIDMPKse5/LMXF5fnwJ0WiO3ZubzG3U7AdgD2AxYBRwGJPBpYHvgduAQ4AhgYWaeWo/7e8ABwM9YO3XJ/wfeB5wz4txOADYF9ga2qO978wbUb+R7V49yjGV1n4cCVwEvB/6sx2OsBH4JrKnn9SnK/F1nU34fKzPzLRFxFvBjyu3Rs4DnAx8D9gXuBL6Xma8YbbvMvH+0Y2u42ZLQoBwNXJyZ+wIPAE8CrsvM5wHbRsRumfkh4C3AWbUlsQrYFvgIsB+wOCJ6nkcqM48DTgVObV0cu1yRmfsAvwBeAjwa+GdgH8oF+ZmUsHpp3f4Q4DP19buAx9f37VnPdZ1zq9s9sZZ9Efj9DazfOu8d7RiZeSVlSpWPAhdm5i0bcIzNKKG8G3A4JTTfBfxDZu4NbBkRB9RtN69l1wPPyMxXA2cBb8nMV3Ttc53t1nN+GlI+ca1BeQLwufp6BXAMsKaOTWwFPBa4ZpT3PQAcRWlhbM3aOaU65lJaG+O1sv55DaXb5DLgxZQweAwwNzPvjoj/ioh9gE0y86cRQWbeGBE/y8x7IiKAnYHnjDg3KJ/8AW4G5mxg/Ua+d7RjXAOcDlwNLNjA/f+i1v8myiyrAewCfLyu/xbwlPr67B7Po9ftNMRsSWhQbgaeWl8/HfgEcFrtGjq+rh/Naymf4P8YuLeWrWHtRfCA0d7U5X7Kp2TqBbxl9/rnM4AfAQcB19U/b+3a7hxKV8x5Y+zrBkY/t3tH2bbX+o18b+sYxwOnAH+5EccY6XusnZJlz7o8Wl1axxhtO00xhoQG5Qzg4Hr30RbAD4EDI+LrwOuAWxrvuxQ4jvLJHson5ouAN0XEx4Ffree4lwIHRcRVwN4RsUtEnDTKds+uddsKuJjSp38ocCWlBdNpDXyVMj7wxTGO+ckez21D6rfeY0TEIcDPMvM9wFMj4pnjPMYpwGERcSVwZ2ZeMsa2XwDeGRHfBHbsYd+aIhy41ozXPRC+nu22pgTIJZl5Qv9rJk0+Q0KS1GR3kySpyZCQJDUZEpKkJkNCktRkSEiSmv4H1+JmfDSIXVQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x432 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline  #添加这行代码，否则图表画不出来\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "import seaborn as sns\n",
    "data=pd.read_csv(r\".\\UCI_Credit_Card.csv\")\n",
    "print(data.shape)\n",
    "print(data.describe())\n",
    "next_month=data['default.payment.next.month'].value_counts()\n",
    "print(next_month)\n",
    "df=pd.DataFrame({'default.payment.next.month':next_month.index,'values':next_month.values})\n",
    "plt.rcParams['font.sans-serif']=['SimHei']\n",
    "plt.figure(figsize=(6,6))\n",
    "plt.title('信用卡违约率\\n (违约：1，守约：0)')\n",
    "sns.set_color_codes('pastel')\n",
    "sns.barplot(x='default.payment.next.month',y='values',data=df)\n",
    "locs,labels=plt.xticks()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过初步观察我们发现每个字段都没有空值，所以无需对空值进行处理。但是我们发现了一些异常值，比如BILL_AMT等字段竟然有负值，这不符合实际情况，你的账单上不会有负值出现。并且结合对csv表的观察，我们还发现BILL_AMT和PAY_AMT字段竟然存在全为0的情况，这说明这行数据对我们分析毫无作用，无借款无还款说明不存在违约不违约这一说法，针对这些数据应给予删除。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据清洗"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(22384, 25)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\Brown\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py:4102: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  errors=errors,\n"
     ]
    }
   ],
   "source": [
    "data1=data[(data['BILL_AMT1']>=0) & (data['BILL_AMT2']>=0) & (data['BILL_AMT3']>=0 )& (data['BILL_AMT4']>=0) & (data['BILL_AMT5']>=0) & (data['BILL_AMT6']>=0)]\n",
    "#去除负值行\n",
    "data2=data1[(data1['BILL_AMT1']!=0) & (data1['BILL_AMT2']!=0) & (data1['BILL_AMT3']!=0 )& (data1['BILL_AMT4']!=0) & (data1['BILL_AMT5']!=0) & (data1['BILL_AMT6']!=0)]\n",
    "#去除值全为0的行\n",
    "print(data2.shape)\n",
    "data2.drop(['ID'],inplace=True,axis=1)  #ID对我们进行数据分析无用，所以删去\n",
    "target=data2['default.payment.next.month'].values  #结果集\n",
    "columns=data2.columns.tolist()\n",
    "columns.remove('default.payment.next.month')\n",
    "features=data2[columns].values   #特征值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型搭建\n",
    "\n",
    "***完整代码如下：***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(22384, 25)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\Brown\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py:4102: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  errors=errors,\n",
      "C:\\Users\\Brown\\Anaconda3\\lib\\site-packages\\sklearn\\model_selection\\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.\n",
      "  warnings.warn(CV_WARNING, FutureWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GridSearch最优参数： {'svc__C': 1, 'svc__gamma': 0.01}\n",
      "GridSearch最优分数：0.8276\n",
      "准确率为0.8261\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\Brown\\Anaconda3\\lib\\site-packages\\sklearn\\model_selection\\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.\n",
      "  warnings.warn(CV_WARNING, FutureWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GridSearch最优参数： {'decisiontreeclassifier__max_depth': 6}\n",
      "GridSearch最优分数：0.8209\n",
      "准确率为0.8193\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\Brown\\Anaconda3\\lib\\site-packages\\sklearn\\model_selection\\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.\n",
      "  warnings.warn(CV_WARNING, FutureWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GridSearch最优参数： {'randomforestclassifier__n_estimators': 6}\n",
      "GridSearch最优分数：0.8046\n",
      "准确率为0.8016\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\Brown\\Anaconda3\\lib\\site-packages\\sklearn\\model_selection\\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.\n",
      "  warnings.warn(CV_WARNING, FutureWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GridSearch最优参数： {'kneighborsclassifier__n_neighbors': 8}\n",
      "GridSearch最优分数：0.8085\n",
      "准确率为0.8029\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from  sklearn.model_selection  import learning_curve , train_test_split ,GridSearchCV\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.metrics import accuracy_score\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "data=pd.read_csv(r\".\\UCI_Credit_Card.csv\")\n",
    "data1=data[(data['BILL_AMT1']>=0) & (data['BILL_AMT2']>=0) & (data['BILL_AMT3']>=0 )& (data['BILL_AMT4']>=0) & (data['BILL_AMT5']>=0) & (data['BILL_AMT6']>=0)]\n",
    "#去除负值行\n",
    "data2=data1[(data1['BILL_AMT1']!=0) & (data1['BILL_AMT2']!=0) & (data1['BILL_AMT3']!=0 )& (data1['BILL_AMT4']!=0) & (data1['BILL_AMT5']!=0) & (data1['BILL_AMT6']!=0)]\n",
    "#去除值全为0的行\n",
    "print(data2.shape)\n",
    "data2.drop(['ID'],inplace=True,axis=1)  #ID对我们进行数据分析无用，所以删去\n",
    "target=data2['default.payment.next.month'].values  #结果集\n",
    "columns=data2.columns.tolist()\n",
    "columns.remove('default.payment.next.month')\n",
    "features=data2[columns].values   #特征值\n",
    "train_x,test_x,train_y,test_y=train_test_split(features,target,test_size=0.25,stratify=target,random_state=1)\n",
    "#将25%的数据当成测试集，其余为训练集\n",
    "classifiers=[SVC(random_state=1,kernel='rbf'),DecisionTreeClassifier(random_state=1,criterion='gini'),\n",
    "             RandomForestClassifier(random_state=1,criterion='gini'),KNeighborsClassifier(metric='minkowski')]\n",
    "classifier_names=['svc','decisiontreeclassifier','randomforestclassifier','kneighborsclassifier']\n",
    "classifier_param_grid=[{'svc__C':[1],'svc__gamma':[0.01]},\n",
    "                       {'decisiontreeclassifier__max_depth':[6,9,11]},\n",
    "                       {'randomforestclassifier__n_estimators':[3,5,6]},\n",
    "                       {'kneighborsclassifier__n_neighbors':[4,6,8]}]\n",
    "#上面代码是为了进行pipeline做准备，管道机制可以将固定数据分析的固定流程封装起来，减少代码冗余\n",
    "def GridSearchCV_work(pipeline,train_x,train_y,test_x,test_y,param_grid,score='accuracy'):\n",
    "    response={}\n",
    "    gridsearch=GridSearchCV(estimator=pipeline,param_grid=param_grid,scoring=score)\n",
    "    search=gridsearch.fit(train_x,train_y)\n",
    "    print('GridSearch最优参数：',search.best_params_)\n",
    "    print('GridSearch最优分数：%0.4f'%search.best_score_)\n",
    "    predict_y=gridsearch.predict(test_x)\n",
    "    print('准确率为%.4f'%accuracy_score(test_y,predict_y))\n",
    "    return response\n",
    "#GridSearchCV 能帮助我们挑选出estimators中最优参数个数，当然参数个数范围还是需要我们指定\n",
    "for model,model_name,model_param_grid in zip(classifiers,classifier_names,classifier_param_grid):\n",
    "    pipeline=Pipeline([('scaler',StandardScaler()),\n",
    "                      (model_name,model)])\n",
    "    result=GridSearchCV_work(pipeline,train_x,train_y,test_x,test_y,model_param_grid,score='accuracy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过GridSearchCV和Pipeline，我们便捷地使用分类器提高了效率，迅速地比较了各种分类器准确成俗，并且从上面我们能看到，在这几种算法当中SVC准确率最高达到0.8261。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
